Variational Planning for Graph-based MDPs
نویسندگان
چکیده
Markov Decision Processes (MDPs) are extremely useful for modeling and solving sequential decision making problems. Graph-based MDPs provide a compact representation for MDPs with large numbers of random variables. However, the complexity of exactly solving a graph-based MDP usually grows exponentially in the number of variables, which limits their application. We present a new variational framework to describe and solve the planning problem of MDPs, and derive both exact and approximate planning algorithms. In particular, by exploiting the graph structure of graph-based MDPs, we propose a factored variational value iteration algorithm in which the value function is first approximated by the multiplication of local-scope value functions, then solved by minimizing a Kullback-Leibler (KL) divergence. The KL divergence is optimized using the belief propagation algorithm, with complexity exponential in only the cluster size of the graph. Experimental comparison on different models shows that our algorithm outperforms existing approximation algorithms at finding good policies.
منابع مشابه
Accelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملGraph Convergence for H(.,.)-co-Accretive Mapping with over-Relaxed Proximal Point Method for Solving a Generalized Variational Inclusion Problem
In this paper, we use the concept of graph convergence of H(.,.)-co-accretive mapping introduced by [R. Ahmad, M. Akram, M. Dilshad, Graph convergence for the H(.,.)-co-accretive mapping with an application, Bull. Malays. Math. Sci. Soc., doi: 10.1007/s40840-014-0103-z, 2014$] and define an over-relaxed proximal point method to obtain the solution of a generalized variational inclusion problem ...
متن کاملExtending Classical Planning Heuristics to Probabilistic Planning with Dead-Ends
Recent domain-determinization techniques have been very successful in many probabilistic planning problems. We claim that traditional heuristic MDP algorithms have been unsuccessful due mostly to the lack of efficient heuristics in structured domains. Previous attempts like mGPT used classical planning heuristics to an all-outcome determinization of MDPs without discount factor ; yet, discounte...
متن کاملVariable Independence in Markov Decision Problems
In decision-theoretic planning, the problem of planning under uncertainty is formulated as a multidimensional, or factoredMDP. Traditional dynamic programming techniques are ine cient for solving factored MDPs whose state and action spaces are exponential in the number of the state and action variables, correspondingly. We focus on exploiting problems' structure imposed by variable independence...
متن کاملStructure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function's In-Degree
This paper introduces Learn Structure and Exploit RMax (LSE-RMax), a novel model based structure learning algorithm for ergodic factored-state MDPs. Given a planning horizon that satisfies a condition, LSE-RMax provably guarantees a return very close to the optimal return, with a high certainty, without requiring any prior knowledge of the in-degree of the transition function as input. LSE-RMax...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013